Prediction of O-linked glycosylation sites in amino acid sequences by LDA-Normal Bayes approach
نویسنده
چکیده
O-linked glycosylation is one of the important types of the mammalian protein glycosylation and is known to be serine or threonine specific. In this work, the author propose a new method of LDA (using the concept of Normal Bayes classification) to predict O-glycosylation sites in amino acid sequences under two different window sizes, 11 and 21. The protein sequences are encoded using hydropathy encoding technique. The prediction can be viewed as a 2-class (positive and negative) classification problem. The classification algorithm is designed by clubbing the concepts of Linear Discriminant Analysis and Normal Bayes classification. By using these techniques, the data sets are transformed into discriminating functions. The protein sequence to be predicted is assigned to the class which has the maximum value for the discriminating function. The prediction accuracy for non-O-linked glycosylation sites (negative sites) is about 76-94%, and the accuracy for O-linked glycosylation sites (positive sites) is about 90-98%.
منابع مشابه
Computational Prediction of O-linked Glycosylation Sites That Preferentially Map on Intrinsically Disordered Regions of Extracellular Proteins
O-glycosylation of mammalian proteins is one of the important posttranslational modifications. We applied a support vector machine (SVM) to predict whether Ser or Thr is glycosylated, in order to elucidate the O-glycosylation mechanism. O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically. Therefore, we developed two types of SVMs for...
متن کاملPrediction of O-Glycosylation Sites in Proteins using PSO-Based Data Balancing and Random Forest
O-glycosylation of mammalian proteins is one of the most important post-translational modifications (PTMs). Hence, there is significant interest in the development of computational methods for reliable prediction of O-Glycosylation sites from amino acid sequences. One particular challenge in training the classifiers comes from the fact that the available dataset is highly imbalanced, which make...
متن کاملPurification, sequence characterization and effect of goat oviduct-specific glycoprotein on in vitro embryo development.
Oviduct-specific glycoprotein (oviductin) plays an important role during fertilization and early embryonic development. The oviductin cDNA was successfully cloned and sequenced in goat, which possessed an open reading frame of 1620 nucleotides representing 539 amino acids. Predicted amino acid sequence showed very high identity with sheep (97%) followed by cow (94%), porcine (77%), hamster (69%...
متن کاملPrediction of the O-glycosylation by Support Vector Machine and Independent Component Analysis for amino acid sequence around O-glycosylation
Glycosylation is one of the main topics in understanding the life systems. N-glycosylation and Oglycosylation are the main two types of mammalian protein glycosylation. The binding process and a consensus sequence are clarified for N-glycosylation. On the other hand, though it is known that O-glycosylation is serine (Ser) or threonine (Thr) specific, consensus sequence is still unknown. We appl...
متن کاملStructural requirements for additional N-linked carbohydrate on recombinant human erythropoietin.
N-Linked glycosylation is a post-translational event whereby carbohydrates are added to secreted proteins at the consensus sequence Asn-Xaa-Ser/Thr, where Xaa is any amino acid except proline. Some consensus sequences in secreted proteins are not glycosylated, indicating that consensus sequences are necessary but not sufficient for glycosylation. In order to understand the structural rules for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011